2025-07-02
Source: Sora
Source: Sora
Palaeo data come in many forms
Where we’ve sampled almost surely is not random — convenience sample
It isn’t
Becomes a problem when we start to collate data into a large database
Even within a single proxy we have inconsistent data
No one in this room needs to be told any of this
Source: Sora
Themes from the website
Traditional methods used in palaeo are unlikely to help with analyses that compare across more than two taxonomic groups
What do we do if we have different resolution data within a proxy? Or different data representations?
Can we use all the data?
Lots of developments in the statistical ecology and omics worlds we can take advantage of
integrated SDMs
joint species distribution models
Model-based ordination
Copula models (marginal models for multivariate responses)
…
Integrated species distribution models
General way to combine — integrate — disparate data
species’ distributions are aggregated spatial locations of all individuals of the same species across a geographical domain
the distribution can be described by a spatial point process, where local intensity (density) of individuals varies
SDMs are a direct or indirect model of this underlying point process
Data integration requires linking each data source to the common underlying point process while accounting for differences among data types
A spatial point process describes the distribution of event locations across some spatial domain
Random process generating points, described by the local intensity \(\lambda_{s}\)
\(\lambda_{s}\) — expected density of points at spatial location \(s\)
If points are random, independent and follow a Poisson distribution with mean \(\lambda_{s}\), homogeneous Poisson process (\(\lambda_{s} \; \forall \; s\))
If \(\lambda_{s}\) varies across \(s\), we have an inhomogeneous Poisson process
Other distributions are available
These work in time as well
Miller et al (2019). Methods Ecol. Evol. 10.1111/2041-210X.13110
The different data sets have their own “model” and the likelihoods are combined during fitting
Allows mixing of different types of data
Similar idea to combine likelihoods from different types of data
gfam() family in the mgcv 📦Miller et al (2019). Methods Ecol. Evol. 10.1111/2041-210X.13110
Instead of modelling one species at a time and stacking the models, Joint Species Distribution Models estimate all species at once
Ideally we’d combine integrated SDMs with JSDMs but as yet, I’m not aware of anything
JSDMs can be used to fit model-based ordinations — might hae to move away from traditional ordination methods to handle features of the data properly
Any modelling of “diversity” needs to handle the sediment accumulation problem
Time averaging different amounts of time per sample leads to
Same problem affects any modelling of any palaeo data, save for annually laminated records…
Rare or data-deficient species?
Large training sets — throw out rare species, singletons etc
eDNA — “filtering” throws away a lot of data (& please don’t rarefy to counts)
Hierarchical models involving random “effects” allow us to borrow strength from more data-rich taxa
Sharma et al (in press). No species left behind: borrowing strength to map data-deficient species. Trends Ecol. Evol. 10.1016/j.tree.2025.04.010
If we can’t / don’t want to use these newer methods, what can we do with dissimilarities?
Fused dissimilarities
Then analyse using NMDS or db-RDA, etc.
Over in the Omics cinematic universe, those folks are doing their own thing integrating disparate kinds of data
Popular techniques are focused around extensions to PLS
Multiple different types of omics analysis on the same samples
What if we don’t have the same proxies measured at the same set of sites? — spatial misalignment
What if proxies represent different amounts of space (time)?
This is covered under the problem of change of support and the concept of data fusion
Source: Giphy
In hindsight palaeoecologists could have been doing things very differently 50 or 100 years ago, which would’ve been real useful to us now
How would we change the field today to make our future lives better when Palaeopen 2.0 comes around?
Very hard to say diatom species x went extinct from this lake at this time
Most palaeo data is presence only
Possibly with associated marks — abundance or biomass conditional upon the taxon being found
We don’t know things about the taxa we don’t find
Hard to put a probability on (e.g.) extinction with this data
But ecologists have been doing this kind of work for decades — occupancy modelling
Most methods require repeated sampling
What would that look like for palaeo?
Could we count same number of things but over \(n \geq 2\) different “samples”?
As we progress through Palaeopen, think about
what “future you” would’ve liked palaeoecologists of the past to have done
how would that change our field?
As we progress through Palaeopen, think about
what “future you” would’ve liked palaeoecologists of the past to have done
how would that change our field?
how do we achieve that?